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Abstract — We prove that for a set of communicating agents to 
compute the average of their initial positions (average consensus 
problem), the optimal topology of communication is given by 
a de Bruijn's graph. Consensus is then reached in a finitely 
many steps. A more general family of strategies, constructed 
by block Kronecker products, is investigated and compared to 
Cayley strategies. 

I. Introduction 

Coordination algorithms for multiple autonomous vehicles 
and decentralized estimation techniques for handling data 
coming from distributed sensor networks have attracted large 
attention in recent years. This is mainly motivated by that 
both coordinated control and distributed estimation have 
applications in many areas, such as coordinated flocking of 
mobile vehicles [26], [27], cooperative control of unmanned 
air and underwater vehicles [4], [3], multi-vehicle tracking 
with limited sensor information [19], monitoring very large 
scale areas with fine resolution and collaborative estimation 
over wireless sensor networks [24]. 

Typically, both in coordinated control and in distributed 
estimation the agents need to communicate data in order 
to execute a task. In particular they may need to agree on 
the value of certain coordination state variables. One expects 
that, in order to achieve coordination, the variables shared by 
the agents, converge to a common value, asymptotically. The 
problem of designing controllers that lead to such asymptotic 
coordination is called coordinated consensus, see for exam- 
ple [12], [20], [15] and references therein. Generalisation to 
high order consensus [22] and nonholonomic agents [18], 
[11], [28] have also been explored. One of the simplest 
consensus problems that has been mostly studied consists 
in starting from systems described by an integrator and 
in finding a feedback control yielding consensus, namely 
driving all the states to the same value [20]. The informa- 
tion exchange is modeled by a directed graph describing 
in which pair of agents the data transmission is allowed. 
The situation mostly treated in the literature is when each 
agent has the possibility of communicate its state to the 
other agents that are positioned inside a neighborhood [26], 
[15] and the communication network is time- varying [27], 
[15]. Robustness to communication link failure [8] and the 
effects of time delays [20] has been considered recently. 
Randomly time-varying networks have also been analyzed 
in [14]. Moreover, a first analysis involving quantized data 
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ttansmission has been proposed in [7], [16]. In this paper we 
consider the consensus problem from a different perspective. 
We are interested to characterize the relationship between 
the amount of information exchanged by the agents and the 
achievable control performance. More precisely we assume 
that N agents are given initial positions in the euclidean 
space, and move in discrete-time in order to reach the average 
of their initial positions. This problem is also called average 
coordinated consensus. Every agent asks several agents their 
position before taking a decision to modify its own position. 
We impose that, in order to limit costs of communication, 
every agent communicates with only v agents (including 
itself), where v < N. This means that in the graph describing 
the communications between agents, the max in-degree is 
at most v. In this paper, we exhibit a family of strategies 
for solving this problem based on de Bruijn's graphs and 
we prove that according to a suitable criteria this is the 
best that one can do. Precisely we compute its performances 
according two criteria: rate of convergence to the average 
of their initial positions and an LQR criterion. We find 
that a deadbeat sttategy is optimal according to the rate 
of convergence, and nearly optimal according to the LQR 
criterion. Finally, we compare it with an another strategy 
having limited communication and exhibiting symmetries: 
the Cayley strategies [6], [5]. It should be noted however 
that our strategy is limited to the case where the number of 
agents is an exact power of v. Whether it is possible to build 
a linear time-invariant deadbeat strategy for any number of 
agents (for a given v) remains an open problem. 

The paper is organized as follows. In Section II we provide 
some basic notions of graph theory and some notational 
conventions. In Section III we formally define the average 
consensus problem. In Section IV we introduce the block 
Kronecker sttategy. In Section V we show that the block 
Kronecker strategy is the quickest possible strategy and 
we compare it with the Cayley strategy. In section VI we 
evaluate the performance of the block Kronecker strategy 
according to suitable quadratic criteria. Finally we gather 
our conclusions in Section VII. 

II. Preliminaries on graph theory 

Before defining the problem we want to solve, we sum- 
marize some notions on graph theory that will be useful 
throughout the rest of the paper. 

Let Q = (V, £) be a directed graph where V = (1, . . . , N) 
is the set of vertices and £ C V x V is the set of arcs or 
edges. If G £ we say that the arc is outgoing 

from i and incoming in j. The adjacency matrix A is a 
{0, l}-valued square matrix indexed by the elements in V 



denned by letting = 1 if and only if (i, j) £ £. Define 
the in-degree of a vertex j as . Aij and the out-degree of 
a vertex i as J2j ^ij ■ In our setup we admit the presence 
of self-loops. A graph is called in-regular (out-regular) of 
degree k if each vertex has in-degree (out-degree) equal to k. 

A path in Q consists of a sequence of vertices i r 

such that (ig, ie+i) £ £ for every I = 1, . . . , r — 1; ii (resp. 
i r ) is said to be the initial (resp. terminal) vertex of the path. 
A cycle is a path in which the initial and the terminal vertices 
coincide. A vertex i is said to be connected to a vertex j if 
there exists a path with initial vertex i and terminal vertex j. 
A directed graph is said to be connected if, given any pair of 
vertices i and j, either i is connected to j or j is connected 
to i. A directed graph is said to be strongly connected if, 
given any pair of vertices i and j, i is connected to j. 
Finally some notational conventions. Let A any matrix 
belonging to K. NxN . With Tr A we denote the trace of 
A, i.e. the sum of the diagonal entries. We say that A is 
nonnegative, denoted A > 0, or positive, denoted A > 0, if 
the entries of A are respectively nonnegative or positive. 

III. Problem formulation 

We suppose that the positions of all N agents are listed 
into one vector of dimension N. If the agents move, say, in 
R 3 , it seems that we would need a 3-ZV-dimensional vector. 
However we will suppose that the positions are scalar, as 
every linear strategy on scalar positions, if applied separately 
on every component of the position, trivially extends to 
strategies for higher dimensions. 

More precisely the problem of our interest can be formal- 
ized in the following way. Consider N > 1 identical systems 
whose dynamics are described by the following discrete time 
state equations 

x+ = Xi + Ui i = l,...,N 

where Xi £ R is the state of the i-th system, x\ represents 
the updated state and m £ R is the control input. More 
compactly we can write 

x + = x + u (1) 

where x, u £ R N . The goal is to design a feedback control 
law 

u = Kx, K£R NxN 

yielding the average consensus, namely a control such that 
all the Xi's become asymptotically equal to the average of the 
initial condition. More precisely, our objective is to obtain 
K such that, for any initial condition x(0) £ R N , the closed 
loop system 

x+ = (I + K)x, 

yields 

lim x(t) = al (2) 
where 1 := [1, . . . , 1] T and 

a=i-l T x(0). (3) 



Writing x(t) as a linear combination of the eigenvectors 
of I + K , it is almost immediate to see that the average 
consensus problem is solved if and only if the following 
three conditions hold: 

(A) Every row and every column of I + K sums to one. 
Hence it has eigenvalue 1 with 1 as left and right 
eigenvector. 

(B) The eigenvalue 1 of I + K has algebraic multiplicity 
one (namely it is a simple root of the characteristic 
polynomial of I + K). 

(C) All the other eigenvalues are strictly inside the unit 
circle. 

For nonnegative matrices, namely for matrices having all 
the components nonnegative, condition (A) is called double 
stochasticity, condition (B) is ergodicity and condition (C) 
is a consequence of double stochasticity. We do not require 
our matrices to be nonnegative, even though it will appear 
that the optimal matrices are. 

Observe now that the fact that the element in position i, j 
of the matrix I + K is different from zero, means that the 
system i needs to know exactly the state of the system j in 
order to compute its feedback action. This implies that the 
j-th agent must communicate his state Xj to i-th agent. In 
this context a good description of the communication effort 
required by a specific feedback K is given by the directed 
graph Gi+k with set of vertices {1, . . . , N} in which there 
is an arc from j to i whenever in the feedback matrix K 
the element (7 + K)ij ^ 0. The graph Qk is said to be 
the communication graph associated with K. Conversely, 
given any directed graph Q with set of vertices {1, . . . , N}, 
a feedback K is said to be compatible with Q if Gi+k is a 
subgraph of Q (we will use the notation Gi+k Q G)- 

In the sequel, we will impose the following constraint 
on the communication graph: the max in-degree of the 
nodes is v. This models the fact that communication lines 
are costly to establish or operate, and every agent has the 
right to talk to a limited number of other agents. Note 
that for compatibility with usual conventions we consider 
that v counts all arcs entering a node, including self-loops 
(which could be considered as 'free communication' in most 
technological situations). 

Without this constraint, the problem becomes trivial: 
choose the complete graph, and the consensus is reached 
in one step. We therefore add the following constraint on 
I + K: 

(D) Every row of I + K contains at most v non-zero 
elements. 

From this point of view we would like to obtain a matrix 
I + K satisfying (A),(B),(C),(D) and minimizing a suitable 
performance index. The simplest control performance in- 
dex is the exponential rate of convergence to the average 
consensus. When we are dealing with average consensus 
controllers it is meaningful to consider the displacement from 
the average of the initial condition 

A(t) :=x(t)-(±l T x(0))l. 



It is immediate to check that, A(t) = x(t) - {j^l T x{t)) 1 
(since the average position jql T x(t) is the same at all times 
t) and that it satisfies the closed loop equation 



A+ = (I + K)A. 



(4) 



Notice moreover that the initial conditions A(0) are such 
that 

1 T A(0) = 0. (5) 

Hence the asymptotic behavior of our consensus problem can 
equivalently be studied by looking at the evolution (0]i on the 
hyperplane characterized by the condition (0. The speed of 
convergence toward the average of the initial condition can 
be defined as follows. Let P any matrix satisfying conditions 
(A),(B),(C). Define 



P{P) 



max AeCT ( P) \ {1} |A| 



if dimker(P-P) > 1 
if dimker(P — I) = 1 , 



which is called the essential spectral radius of P. As the 
dominant eigenvalues of P* is one and the others are smaller 
in magnitude than jo(P)*, the essential spectral radius says 
how quickly P* converges to the rank-one matrix l/iVll T , 
where N is the dimension of P. In this context the index 
p(I + K) seems quite appropriate for analyzing how per- 
formance is related to the communication effort associated 
with a graph. The smaller the essential spectral radius, the 
quicker the system will converge to the average of the initial 
condition. 

However in control theory, strategies that converge in 
finite time or very quickly are sometimes dismissed on 
the ground that they lead to large values of update values 
u{t) = x(t + 1) — x(t), that can be physically impossible 
or very costly to implement. Hence a strategy is often 
required to optimize an LQR cost, taking into account both 
the quickness convergence and the norm of updates values. 
Therefore another suitable measure of performance could be 
the following quantity: 



J = E(J2\\x(t) - x(^)\\ 2 +1 \\u(t)\\ 2 ), 



(6) 



where x(t) is the vector of positions at time t, x(oo) — 
linioo x(t) is the vector whose every entry is the average 
of initial positions, u{t) — x(t + 1) — x(t) is the update 
vector at time t, the initial positions are supposed to be 
uncorrelated random variables with unit variance, E denotes 



the expectation, 



= x x is the euclidean norm and 7 



is a nonnegative real. 

We will prove that the optimal topology of communication 
(in the meaning of speed of convergence) is given by a de 
Brujin's graph. We will call the control strategies based on 
such graph block Kronecker strategies, as explained in the 
next section. For these strategies we will evaluated © and 
we will compare them to another family of strategies based 
on a regular communication graph having the same degree 
v. the Cayley strategies [6], [5]. 



IV. Block Kronecker strategies 

In this section, we define block Kronecker strategies. Let 
Abeanxn matrix satisfying (A),(B),(C),(D) and k be a 
nonnegative integer. Note that if A is full then n < v (since 
the number of non-zero elements cannot exceed v). Then we 
build an n k x n k matrix M in the following way. Let 



A = 



Of) 

ai 
a n -i 



be a row-partition of the matrix A, where <Zj S M lxn . Then 
M is the matrix 



M 



I n k-1 ® CLq 
I n k-1 (8 CLl 

r „>-l <g> a„_ : 



(7) 



For example, if 



A = 



(with a + j3 — 1) and k = 3, then 



M = 



(a (3 



(3 a 



a (3 



(3 a 



a f3 



\ 



a (3 



(3 a) 



This is a kind of block Kronecker product. A general 
theory of block Kronecker product is built in [17], We 
only need a more restricted definition, detailed below. The 
new matrix M is a matrix of larger dimension than A and 
satisfying conditions (A),(B),(C),(D): (A) and (D) follow 
from the definition, while (B) and (C) are proved below. 
Hence it can play the role of the matrix / + K in Section 
|Ill]We start by some reminders on Kronecker product, define 
the block Kronecker product and explore the properties of 
the latter. 

A. Kronecker product 

We recall that the Kronecker product A®B of the matrices 
A and B is the matrix [ay-B],j, whose dimensions are the 
product of dimensions of A and B. Some useful properties 
of the Kronecker product are the following: 

. AB ® CD = (A <g> C)(B ® D); 
. Tr A <g> B = Tr ATr B; 

• the eigenvalues of A £g> B are all possible products of 
an eigenvalue of A with an eigenvalue of B; 

• the eigenvectors of A ® B are all possible Kronecker 
products of an eigenvector of A with an eigenvector of 
B. 

The Kronecker product is sometimes called tensor product. 
Let us see why. For instance consider the matrices B,C,D 



of sizes tub x ub, rue x n c, m D X n-D- The Kronecker 
product has size m^mcm^ x riBncUD, and an arbitrary 
element of 7? C 7> can be denoted (7? eg) C® D) a 6c,rfe/ = 
B a dCb e D c f, where the index written as abc denotes the 
number c + bmp + amcmrj and the index def is the 
number / + enn + dncn-D', we suppose that the indices 
start form zero: a = 0, . . . , mg — 1, etc. If B, C, D happen 
to be square matrices of size n, this notation coincides with 
the usual notation in base n of an index running from 
to n 3 — 1. This notation of the Kronecker product is very 
close to the tensor product used in algebra and differential 
geometry. The only difference is that B C ' D, viewed as 
a tensor product, is considered as a 6-dimensional array with 
a, b, c, d, e, / as separate indices, instead of a matrix (i.e., a 
2-dimensional array). All this immediately extends to more 
than three matrices. 

B. Block Kronecker product 

Let us now consider the following variant of Kronecker 
product, that we call block Kronecker product. Consider for 
instance two matrices B (of size n 3 x n 3 ) and C (of size 
n 2 x n 2 ). The block Kronecker product of B and C is defined 
as follows: its element of index abede, ghijk is the element 
B c de,ghiC a b,jk (notice the shift of the first indices by two 
places). We will denote this matrix by BQC. This definition 
applies to any two square matrices whose dimensions are 
powers of n. In general, we can write (B C) Pt g — 
(B C) a tf p \ q , where a operates a cyclic permutation by 
one place to the left on the digits of p in base n, and C is 
of size n*. 

The matrix M defined by Equation (0 can be expressed 
as M — (I • • • 7) A (where the n x n identity matrix 
7 is repeated k — 1 times). If we write the index of M in 
base n, then Mi 1 ,„i k> j 1 ,,,j k = Ii 2 ,jili 3 ,j 2 ■ ■ ■ ^ik-idk-^hdo' 

This form is particularly useful to compute the behavior 
of M from the properties of the block Kronecker product, 
which we now explore. As a first property, we can easily see 
that 

(BQCf = C T QB T . (8) 

We can also prove the following lemma. 

Lemma 4.1: For any matrices A. B, C, D, E, F for which 
all the products below are meaningful, we have 

((A B) C)((D E)Q F) = BDQ [CE AF). (9) 
Proof: We write, using Einstein's convention (indices 
repeated twice in an expression are implicitly summed over), 

[{{A B) C)((D E) F)] u>w = 
= ((A B) C) U , V ((D E) F) VtW 



— A u 



where u, v, and w, interpreted as sequences of digits in base 
n, have been partitioned into U1U2M3, wit | 2^ l 3, and W1W2W3 
in an appropriate way. This is possible if B and D have same 
size, as well as C and E, and A and F. Then the expression 



above can be regrouped as 

(BD) (CE) 

U\ ,11)2 

(AF) 

= (BDQ(CE®AF) ) U;W , 

which ends the proof. ■ 
In particular, if B = D = 1 we have 

(A Q C)(E Q F) — (CE AF). (10) 

If we choose C = E = 1 instead, we have 

(A B)(D Q F) = BD AF. (11) 

The following proposition provides an interesting charac- 
terization of the powers of any order of the matrix M. 

Proposition 4.1: For A a square matrix, M defined by 
Equation (0, and any integers r > and < s < k, 

M rk+s = • • • v4 r ) (A r+l • • • A r+1 ), 

V v ' v v ' 

fc— s s 

where the exponents in the right-hand side sum to rk + s. 

Proof: We prove the claim by induction on rk + s. It is 
true by definition for rk + s = 1. The induction step is easily 
proved by applying Equation (0. Indeed, \(A r (A r • • • 
A r ))Q(A r+1 ®- ■ •0A r+1 )][((/0- • •01)0(70- • -07))0A)] 
can be written as (A r • • • A r ) (A r+1 • • • A r+1 
(A r A)). The argument is correct also for limit cases s = 
and s = k — 1. ■ 
In particular we have the following. 

Corollary 4.1: For A a square matrix and M defined by 
Equation (0), 

M k = A(g> ■ ■ ■ A. 

Moreover, if A satisfies (A),(B),(C) the essential spectral 
radius of M is the fcth root of the essential spectral radius 
of A. 

Proof: The first part is a particular case of Proposition 
14.11 From the properties of Kronecker product, we know the 
spectrum of M k is composed of all possible products of k 
eigenvalues of A. Hence the largest eigenvalue in absolute 
value, different from 1, of the matrix M k results to be l fe_1 A, 
where A denotes the largest eigenvalue in absolute value, 
different from 1, of the matrix A. ■ 
This also proves also that conditions (B) and (C) are 
verified for M when they are for A. If we take 

A = l/nll T , (12) 

of size n, then M k is the matrix l/n fe ll T of size n k with 
all identical elements. Thus we have a strategy converging 
exactly in k steps. We comment further on this example in the 
next section. Another property of M that will prove useful 
is stated in the next proposition. 

Proposition 4.2: For A a square matrix, M defined by 
Equation (0, and any integers r > and < s < k, 



M Trk+s M rk +» 



A 1 A r 



A 1 A r 



A 



.4 



r+1 



nr+1 



,r+l 



where the sums of exponents is rk + s. 

Proof: From Proposition 14.11 we know that M rk+S — 
(A r • • • A r ) (A r+1 ® • • • <g) A r+1 ). Hence, by Equation 
®, M Trfc+s = (A Tr+1 • • • ^ Tr+1 ) (A r • • • A r ). 
These two expressions are multiplied using Equation ( TTUb . 

■ 

Now we would like to compute Tr M T M t+ . This will 
be useful later when we will evaluate the performance of the 
block Kronecker strategy. We first need the following lemma. 

Lemma 4.2: Let Bq, Bi,..., B k -i be k square matrices 
of same dimensions. If I < k is relatively prime to k, then 

Tr (B Bi • • • Bt_i) (B t • • • B fc _i) = 
Tr B BiB 2 iB 3 i ■ ■ ■ B^ k _ 1)U 

where the indices are understood modulo fc. 

Proof: If we use Einstein's convention (repeated indices 
are summed over), we can write 

Tr (B ® Bi • • • Bi_i) (Bi ... 
= [(B Bi • • • (Bi • • • S fe _i)] Pi p 

= (- B o)p fc _ ! ,j9 ( 5 i)pfc-( + i,pi ' ' ' (^-Opfc-l^Pi-l 

(Bi) P0}Pl ■ ■ ■ (B k ) Pk _ l _ uk 
= (Bo)p k -i,po(Bi)p ,p l (B2i) Pl ,p 2l 

(B3l)p2l,Pl ' ' ' (-^(fe-l)i)p(fc-2)( ,P(fc-l)I 

= Tr BoBiB 2 i ■ ■ ■ B(fc_i)j, 

where p = p Pi • ■ -Pfc-i- ■ 
Proposition 4.3: For A and M as defined above, if A is 
normal (i.e., A T A = AA T ) then 

Tr M Tt M t+1 =Tr A T *A t+1 . 
Praq/:- We know that Af T< M* = v4 T '',4 r . . . 
A r+ A r+1 , if t = rk + s for some < s < k. Hence 

M Tt M t+1 = (A Tr A r ®- ■ -®A Tr+1 A r+1 )((I®- ■ -07)0,4), 

which by Equation (fTTT i is equal to (A T A r • • • 
A Tr+1 A r+1 ) A Tr A r+1 . By Lemma S2J this matrix has 
the same trace as A TT A r . . . A Tr+1 A r+1 A TT A r+1 . As A T 
and A commute, this is also the trace of A T A t+ . ■ 

C. De Bruijn 's graph 

The communication graph of M is (a subgraph of) a de 
Bruijn graph, which has n k vertices and arcs from any i to 
ni, ni + 1, ni + 2, . . . and ni + k — 1 (all modulo n k ). In 
particular, if A is given by Equation d!21 l. then M is the 
adjacency matrix of a de Bruijn graph, normalized so as for 
every row to sum to one. This graph was introduced by de 
Bruijn [10] in 1946 and has been considered for efficient 
distribution of information in different context such as in 
parallel computing [23] and peer-to-peer networks [13]. This 
paper can be seen as an extension of this idea to consensus 
problems. 



D. Design decentralisation 

The process itself of convergence to consensus is de- 
centralised, in the sense that every agent acts on its own. 
However the communication strategy (who talks to whom?) 
must be designed once for all beforehand. This can be done 
in centralised way, where a new external agent dispatch to 
every other agent their own strategy. This can also be done in 
a decentralised way, where every agent is attributed a number 
i between and N — 1 and then finds the agents of number 
vi, vi + 1,. . . , vi+v — 1. Achieving this in the most effective 
way is a problem of its own, and is not treated in this paper. 

V. The quickest possible strategy 

We have seen that starting from A with all identical entries, 
we get arbitrarily large matrices M converging in finite time 
k. If we write N = n the dimension of M, this convergence 
time is log N / log n = log Nj log v, where v is the maximal 
in-degree of the graph of communication for M. We can 
see that no strategy, whether linear or not, whether time- 
invariant or not, can converge more rapidly. Indeed, to reach 
the average of the initial conditions, every agent must have 
information about all other agents, but it can only know 
v other positions in one step of time, v 2 in two steps of 
time, etc. Hence the propagation of information needs around 
log N/ log v steps to connect all agents. This reasoning is 
made rigorous in the following proposition. 

Proposition 5.1: Let M e R NxN such that M > 0. Let 
v be defined as above. Then M k > implies v k > N. 

Proof: The fact M k > implies that for any pair of 
nodes (i, j) there exists in the graph Qm a path connecting i 
to j of length k. Hence there are at least N 2 paths of length 
k. Let now Pi denote the number of paths having length i. 
The previous consideration implies that P k > N 2 . On the 
other hand it is easy to see that Pi < vN and in general 
that Pi < v 1 N from which we get that < v k N . Hence 
u k N > N 2 from which it results that v k > N. ■ 

The above proposition considers only the time-invariant 
case. An identical result can be found for the time-varying 
case, showing that there is no difference, in terms of speed 
of convergence toward the meeting point, between the time- 
invariant and the time-varying cases. This can be seen an 
a posteriori justification of our interest in the class of the 
time-invariant strategies. 

A linear time-invariant strategy converges in finite time iff 
its essential spectral radius is 0. For a strategy converging in 
infinite time, the essential spectral radius is a good measure 
of the convergence to the average of the initial conditions, 
as already mentioned. 

A. Comparison between block Kronecker strategy and Cay- 
ley strategy 

In this subsection we propose a comparison of the block 
Kronecker strategy with another strategy whose underlying 
communication graph has limited max in-degree and exhibits 
strong symmetries: the Cayley strategy. 
First we recall the concept of Cayley graph defined on 
Abelian groups [2], [1]. Let G be any finite Abelian group 



(internal operation will always be denoted +) of order \G\ = 
N, and let S be a subset of G containing zero. The Cayley 
graph Q(G,S) is the directed graph with vertex set G and 
arc set 

£ = {(g,h):h-geS}. 

Notice that a Cayley graph is always in-regular, namely the 
in-degree of each vertex is equal to \S\. Notice also that 
strong connectivity can be checked algebraically. Indeed, 
it can be seen that a Cayley graph Q (G, S) is strongly 
connected if and only if the set S generates the group G, 
which means that any element in G can be expressed as a 
finite sum of (not necessarily distinct) elements in S. If S is 
such that — S = S we say that S is inverse-closed. In this 
case the graph obtained is undirected. 

A notion of Cayley structure can also be introduced for 
matrices. Let G be any finite Abelian group of order |G| = 
N. A matrix P S R GxG is said to be a Cayley matrix over 
the group G if 



p. . = p. 



Vi,j,heG. 



It is clear that for a Cayley matrix P there exists a 7r : G — ► K 
such that Pij = n(i — j). The function tt is called the 
generator of the Cayley matrix P. Notice how, if P is a 
Cayley matrix generated by tt, then Qp is a Cayley graph 
with S = {h € G : n(h) 7^ 0}. Moreover, it is easy to see 
that for any Cayley matrix P we have that PI = 1 if and 
only if 1 T P = 1 T . This implies that a Cayley stochastic 
matrix is automatically doubly stochastic. In this case the 
function tt associated with the matrix P is a probability 
distribution on the group G. Among the multiple possible 
choices of the probability distribution tt, one is particularly 
simple, namely n(g) = 1/\S\ for every g E S. 

Example 1: Let us consider the group Zjy of integers 
modulo N and the Cayley graph Q(Zn,S) where S — 
{ — 1,0,1}. Notice that in this case S is inverse-closed. 
Consider the uniform probability distribution 

tt(0) = tt(1) = ?r(— 1) = 1/3 

The corresponding Cayley stochastic matrix is given by 



(13) 



Notice that in this case we have two symmetries. The first 
is that the graph is undirected and the second that the graph 
is circulant. These symmetries can be seen in the structure 
of the transition matrix P that, indeed, turns out to be both 
symmetric and circulant [9]. 

Example 2: Let us now consider the group Zjy x Zjv 
and the Cayley graph Q{7Ln x Zn,S) where S — 
{(1, 0); (0, 0); (0, 1)}. Again consider the uniform probabil- 
ity distribution 

7r((0,0))=7r((l,0))=7r((0,l)) = l/3 
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The corresponding Cayley stochastic matrix is given by the 
following block circulant matrix belonging to M. N xN 



P 



where P%, P2 G 
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(14) 



pNxN 



are such that 



Pi 



1/3 1/3 
1/3 1/3 



1/3 










1/3 



p 2 = 3 L (15) 



This example can be generalized to the more general case 
of the discrete d-dimensional tori 1%, extensively studied in 
the literature regarding the peer-to-peer networks [21], [25]. 

Now we recall an interesting result regarding the essential 
spectral radius of the Cayley stochastic matrices. Assume 
that P € M. NxN is a Cayley stochastic matrix generated 
by a suitable it and assume that |5| = v, where S is as 
previously defined. Moreover assume that G S. Notice 
that this last fact implies that Pa > 0, Vz : 1 < i < N. 
Then it follows that p > 1 - C/N 2 '^'^, where C > is 
a constant independent of S and N the number of agents. 
This result was proved in [6]. 

On the other side, the block Kronecker strategy con- 
structed from any matrix A has essential spectral radius 
lAI 1 /'"", where |A| is the essential spectral radius of A, as 
stated in Corollary 14.11 

If < |A| < 1, then | A | 1 / ^ behaves like 1 - fj,/k for large 
k and some /1. Recall that k is logiV/ logrt. Hence this is 
better than abelian Cayley strategies. 

In conclusion, block Kronecker strategies have a better 
essential spectral radius, hence a quicker convergence speed, 
than Cayley strategies. For the particular choice given by 
Equation ([T2"i i. we converge in finite time, and this time is 
the smallest possible over all linear strategies with the same 
constrained degree. 

B. Simulation result 

As an illustration, we present a simulative comparison 
between the Cayley strategy and the block Kronecker strat- 
egy. The network considered consists of N = 81 agents. 
The matrix P for the Cayley strategy is the matrix (13[ , 
whereas the matrix M for the block Kronecker strategy is 
built starting from 



A 



1/3 
1/3 
1/3 



1/3 
1/3 
1/3 



1/3 
1/3 
1/3 



with n — 3 and k = 4. The initial conditions has been 
chosen randomly inside the interval [—50, 50]. In both cases 
the in-degree is 3. Notice that, as depicted in Figure [T] the 
block Kronecker strategy reaches the average of the initial 
conditions in a finite number of steps whereas, the Cayley 



strategy, after the same numbers of steps, is still far from 
converging toward the meeting point. 



We get a lower bound on J\ by summing only the first k 
terms: 





k-l 



Fig. 1. The block Kronecker strategy (left) converges in finite time, while 
the Cayley strategy (right) has a relatively slow convergence 



VI. LQR COST 

In this section we want to evaluate the performance of 
the block Kronecker strategy according to the quadratic cost 
J = Ji + 7J2, where Ji = EEi>o( s; W — x(oo)) T (x(t) — 
x(oo)) accounts for the quickness of convergence, J2 — 
EY, t>0 (x(t+l)-x(t)) T (x(t+l)-x(t)) limits ±e 

norm of 

the updates, and 7 weights the respective importance of those 
two factors. Precisely we evaluate J for any block Kronecker 
strategy derived from a normal matrix A. Remember that the 
initial state x(0) is supposed to be characterized by a identity 
covariance matrix. We start with a lemma which provides an 
upper and a lower bound for J\ . 

Lemma 6.1: If A is a normal n x n matrix satisfying 
conditions (A),(B),(C), and p is the essential spectral radius 
of A, then the J\ cost of the corresponding block Kronecker 
strategy of size n k satisfies: 



Jl<J< Ju, 

t (A T A)/n) k 
Tr {A T A) I n 

Proof: Classical arguments lead to write: 



where J L = N 1 { f AT A }{ 7 /l! k and J v = J L 



J ±- s (TrA T A-l) 



— x(oo)) T (x(t) — x(oo)) 

t>0 

= J^E (x(t) - x(oo)) T (x(i) - x(oo)) 
t>o 

= Tr (x(t) - x(oo)) T (x(t) - x(oo)) 
t>o 

= J^E Tr (x(t) — x(oo))(x(t) — x(oo)) 

t>0 

= Tr ( M * - E)E(x(0)x(0) T )(M f - Ef 
t>o 



t>0 



with E = \ jn k \\ T . 
Now, Tr (M* - 



E)(M t - E) T = Tr M t M tT 



Tr E = Tr M l M tT - 1. When M is derived by block 
Kronecker product from a normal matrix A, this is equal to 
(Tr (A T A) r ) k ~ s (Ti (A T A) r+1 ) s if t = rk + s, according 
to Proposition 14.21 



Jl = J2 X^( Tr {A T A) r ) k - s (Tr (A T A) r+1 ) s - 1) 

r>0 s=0 
k-l 

> ^(Tr (A T A)°) k ~ s (Tr (A T A) 1 ) S — k (16) 

s=Q 
k-l 

= ^n fe - s (Tr A T A) S - k, 



The last summation is a geometric series that can be evalu- 
ated, leading to the bound 

l-(TrA T A/n) k 
J ^ N 1 — Tr A T A/n ~ k 

This proves the left inequality in the claim. 

For the right inequality, we find an upper bound on 
the terms neglected in the lower bound ( TTol l. As normal 
matrices can be diagonalized by a unitary transformation, the 
eigenvalues of A T A, which we denote 1, Ai, A2, ■ • • , A n _i, 
are precisely the squared module of the eigenvalues of A. In 
particular, p 2 = Ai, and the trace of (A T A) f is 1 + ^h- 

The terms neglected in the lower bound ( fT6b are 



k-l 



r >l s=0 



For every r, we bound every of the k terms by the highest 
(for which s = 0). Hence the neglected terms are bounded 
from above by: 

]T((i + E>I?-i) = fc£p(X,...,A;_i), 



where P is a polynomial in the variables Ai, . . . , , A n _i 
with no independent term: all monomials have degree at 
least one. Now we can sum all corresponding monomials 
for r — 1,2,...: this is a geometric series of progres- 
sion at most \\. Hence S r >i -P(Ai, . . . , A£) is at most 
T J x -P(Ai,...,A fe ) = T ^ r (TF^A-l). 

Hence Ji differs from our lower bound by at most 
fei^(Tr A T A-l). M 

Thus J x = N \ { X% A lyn + OQogN). Now we 
estimate J2. 

Lemma 6.2: Under the assumptions of Lemma 16.11 if pi 
denote the eigenvalues of A different from one, 



< J 2 < 2 Ji - N. 



Proof: First we notice, adapting the first steps of the 
proof of the preceding proposition, that J2 = Tr (M — 
7) T (Af T ) t M*(M - /), with I the identity. This involves 
terms of the form (M T ) t+1 M t . More precisely, 



J 2 = E Tr {M T M t+1 -M 



4+1 1 "^ i+1 M { - 



t>o 



- M Tt M t+1 + M Tt M l ) 
2 ^(Tr Af Tt M* - 1) - N- 



2^(Tr M 7 

t>o 



M 



t+i 



1) 



The first term of the last member is precisely 2 Ji, the last 
term is, thanks to PropositionEOl 2 St>o( Tr A Tt A t+1 - 1). 

From Cauchy-Schwartz inequality applied to Frobenius 
norm, Tr A T * A t+1 < VTi A Tt+1 A^Ti A T± A t < 
Tr A Tt A\ 

Hence E t >o( Tr 1) < Et>o £i A * = 



J = JV (l + 2 7 )- 



J = 



iv((l + 27) ^ - 7 + OQogN/N)) , 



J2i rrr > where, as argued in the proof of Lemma [67T 
A, = H 2 . 
Hence, 

1 — Tr (A T A) /n 7 + ^ io 8 JV /^ 

Since the trace of A T A is the sum of squares of ele- 
ments of A, we see that the coefficient of N (neglecting 
the 0(\ogN/N)) term) is optimized by the matrix A = 
l/rtll T , whatever the value of 7 is. In this case, the lower 
bound obtained on J\ is exact, since only k terms are non- 
zero. The optimal cost is then 

1/N 

-lA 

with v = n. 

Hence there is here no trade-off between J\ and J2 among 
the family of block Kronecker strategies, in contrast with the 
general LQR theory. 

Note that the optimal control strategy for unconstrained 
degree (every agent knows every position) is easily solved 
by a scalar algebraic Riccati equation, leading to the optimal 
cost J = N(l + y/l + 47)/2. If 7 is small and n is large, then 
the optimal finite-time block Kronecker approaches the un- 
bounded degree optimal solution with a cost approximately 
equal to (1 + j)N. 

VII. Conclusions 

We have introduced a family of strategies for a consensus 
problem, whose graph of communication is de Bruijn's 
graph. We have shown that they can converge in finite 
logarithmic time, which is optimal. We have evaluated the 
LQR cost of these strategies, proving their quasi-optimality 
if the cost of update is small and the degree of the graph not 
too low. 

This work can be extended in several directions, including: 

• designing strategies valid for any N, not only exact 
powers of n; 

• tackling the continuous-time case, where no deadbeat 
strategy can exist; 

• estimating the LQR cost for Cayley strategies; 



• finding strategies that minimize the LQR cost for any 
cost 7 of the update; 
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